Search CORE

King's Research Portal

A Review of 2010 for PLoS Computational Biology

Author: Andrew M. Collings
Cecy Marden
DB Searls
G Chalancon
Philip E. Bourne
RF Doolittle
Rosemary Dickin
RS Datta
Ruth Nussinov
Publication venue: Public Library of Science
Publication date
Field of study

arXiv.org e-Print Archive

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

FigShare

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Author: BFJ Manly
CI Castillo-Davis
David Johnson
DB Searls
DB Searls
DD Womble
E Badidi
F Antequera
J Krueger
J Theilhaber
JD Wren
JD Wren
JF Costello
JM Claverie
Jonathan D Wren
JR Quinlan
K Davies
K Nakai
L Stein
Le Gruenwald
LV Zhang
M Ashburner
M Gardiner-Garden
M Safran
P Clark
RS Michalski
S Foissac
S Muggleton
SP Shah
TV Venkatesh
V Bajic
W Frawley
WM Shui
WM Shui
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands

Genoviz Software Development Kit: Java tool kit for building genomics visualization applications

Author: AE Loraine
Ann E Loraine
BJ Haas
Cyrus Harmon
D Huntley
DB Searls
Ed Erwin
EL Sonnhammer
Eric Blossom
GA Helt
Gregg A Helt
John W Nicol
JW Nicol
MS Cline
NL Harris
P Aldhous
RC Holland
S Fischer
S Hoon
Stephen A Chervitz
Steven G Blanchard
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Visualization software can expose previously undiscovered patterns in genomic data and advance biological science. Results The Genoviz Software Development Kit (SDK) is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK framework provides a mechanism for incorporating adaptive, dynamic zooming into applications, a desirable feature of genome viewers. Visualization capabilities of the Genoviz SDK include automated layout of features along genetic or genomic axes; support for user interactions with graphical elements (Glyphs) in a map; a variety of Glyph sub-classes that promote experimentation with new ways of representing data in graphical formats; and support for adaptive, semantic zooming, whereby objects change their appearance depending on zoom level and zooming rate adapts to the current scale. Freely available demonstration and production quality applications, including the Integrated Genome Browser, illustrate Genoviz SDK capabilities. Conclusion Separation between graphics components and genomic data models makes it easy for developers to add visualization capability to pre-existing applications or build new applications using third-party data models. Source code, documentation, sample applications, and tutorials are available at <url>http://genoviz.sourceforge.net/</url>.</p

Public Library of Science (PLOS)

The Quantitative Methods Boot Camp:Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life Sciences

Author: A Madlung
A Via
AM Bentley
AM Depelteau
B Bloom
B Efron
CA Brewer
DB Searls
DB Searls
DL Vaux
DM Windish
EB Speth
J Schell
JL Gutlerner
Joanne A. Fox
Johanna L. Gutlerner
JP Simmons
KV Thompson
LA Steen
LJ Gross
M Colon-Berlingeri
Melanie I. Stefan
Michael Springer
MJ Costa
N Uchida
Richard T. Born
SJ Eglen
TG Smolinski
U Fuller
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/04/2015
Field of study

<div>The past decade has seen a rapid increase in the ability of biologists to collect large amounts of data. It is therefore vital that research biologists acquire the necessary skills during their training to visualize, analyze, and interpret such data. To begin to meet this need, we have developed a “boot camp” in quantitative methods for biology graduate students at Harvard Medical School. The goal of this short, intensive course is to enable students to use computational tools to visualize and analyze data, to strengthen their computational thinking skills, and to simulate and thus extend their intuition about the behavior of complex biological systems. The boot camp teaches basic programming using biological examples from statistics, image processing, and data analysis. This integrative approach to teaching programming and quantitative reasoning motivates students’ engagement by demonstrating the relevance of these skills to their work in life science laboratories. Students also have the opportunity to analyze their own data or explore a topic of interest in more detail. The class is taught with a mixture of short lectures, Socratic discussion, and in-class exercises. Students spend approximately 40% of their class time working through both short and long problems. A high instructor-to-student ratio allows students to get assistance or additional challenges when needed, thus enhancing the experience for students at all levels of mastery. Data collected from end-of-course surveys from the last five offerings of the course (between 2012 and 2014) show that students report high learning gains and feel that the course prepares them for solving quantitative and computational problems they will encounter in their research. We outline our course here which, together with the course materials freely available online under a Creative Commons License, should help to facilitate similar efforts by others.</div

Harvard University - DASH

Edinburgh Research Explorer

FigShare

Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars

Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology

Public Library of Science (PLOS)

Edinburgh Research Explorer

The University of Manchester - Institutional Repository

IgTM: An algorithm to predict transmembrane domains and topology in proteins

Author: B Mathews
C Pasquier
D Angluin
D Angluin
D Lopez
D Lopez
Damián López
DB Searls
DT Jones
E Wallin
EE Pashou
ELL Sonnhammer
EM Gold
GE Tusnády
H Viklund
J Berstel
JE Hopcroft
JM Sempere
L Käll
LR Murphy
M Burset
M Ikeda
M Punta
Marcelino Campos
MM Gromiha
NS Sadovskaya
P Fariselli
P García
P Peris
PG Bagos
Piedachu Peris
R B
S Jayasinghe
S Mitaku
S Möller
T Knuutila
T Li
T Yokomori
T Yokomori
Publication venue: BioMed Central
Publication date: 01/09/2008
Field of study

Abstract Background Due to their role of receptors or transporters, membrane proteins play a key role in many important biological functions. In our work we used Grammatical Inference (GI) to localize transmembrane segments. Our GI process is based specifically on the inference of Even Linear Languages. Results We obtained values close to 80% in both specificity and sensitivity. Six datasets have been used for the experiments, considering different encodings for the input sequences. An encoding that includes the topology changes in the sequence (from inside and outside the membrane to it and vice versa) allowed us to obtain the best results. This software is publicly available at: <url>http://www.dsic.upv.es/users/tlcc/bio/bio.html</url> Conclusion We compared our results with other well-known methods, that obtain a slightly better precision. However, this work shows that it is possible to apply Grammatical Inference techniques in an effective way to bioinformatics problems.</p

eScholarship - University of California

Lessons from the CAGI-4 Hopkins clinical panel challenge

Author: Adhikari A
Buckley BA
Carraro M
Chandonia J-M
Chhibber A
Cutting GR
Fu Y
Gasparini A
Jones DT
Kramer A
Kundu K
Lam HYK
Leonardi E
Moult J
Pal LR
Searls DB
Shah S
Sunyaev S
Tosatto SCE
Yin Y
Publication venue
Publication date: 01/01/2017
Field of study

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state of the art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of fourteen possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other groups. We discuss the causal variant predictions by the different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false positive rate of DNA-guided analysis in the absence of prior phenotypic indication. This article is protected by copyright. All rights reserved

UCL Discovery

Archivio istituzionale della ricerca - Università di Padova

Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

Author: A Moura
ACE Darling
AL Delcher
AL Delcher
CJ van Rijsbergen
D Frishman
DA Rowe-Magnus
DB Searls
E Rivas
Enrico Coiera
F Baquero
F Meyer
F Meyer
Guy Tsafnat
H Quesneville
HW Stokes
HW Stokes
IT Paulsen
J Fleiss
J Landis
Jaron Schaeffer
Jon R Iredell
K Rutherford
L Stein
M Ashburner
M Kanehisa
MA Andrade
MJ Joss
R Overbeek
RM Hall
RS Levings
S Ji
S Leung
Sally R Partridge
SF Altschul
SR Partridge
U Bohnebeck
WR Pearson
Y Boucher
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies. Results We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved <it>κ </it>= 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (<it>α </it>≥ 95%, <it>E </it>≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives. Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96). Conclusion Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.</p